Spatial correlations in attribute communities 
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Community detection is an important tool for exploring and classifying the properties of large 
complex networks and should be of great help for spatial networks. Indeed, in addition to their 
location, nodes in spatial networks can have attributes such as the language for individuals, or any 
other socio-economical feature that we would like to identify in communities. We discuss in this 
paper a crucial aspect which was not considered in previous studies which is the possible existence 
of correlations between space and attributes. Introducing a simple toy model in which both space 
and node attributes are considered, we discuss the effect of space-attribute correlations on the 
results of various community detection methods proposed for spatial networks in this paper and in 
previous studies. When space is irrelevant, our model is equivalent to the stochastic block model 
which has been shown to display a detectability-non detectability transition. In the regime where 
space dominates the link formation process, most methods can fail to recover the communities, an 
effect which is particularly marked when space-attributes correlations are strong. In this latter case, 
community detection methods which remove the spatial component of the network can miss a large 
part of the community structure and can lead to incorrect results. 



PACS numbers: 

I. INTRODUCTION 

Many networks are embedded in real space and there 
is a cost associated to the length of links. Examples 
of such spatial networks can be found in infrastructures 
such as power grids, distribution and logistic networks, 
transportation and mobility networks, and also in com- 
puter science or biology with the Internet and neuronal 
networks respectively (see for example the review [1]). 
Spatial constraints are so important in these networks 
that one can expect a non-trivial spatial organization as 
shown in various examples [2-10]. 

In spatial networks, each node is described by its coor- 
dinates (usually in a 2d space) but has in general other 
attributes. For individuals, it can be any cultural or 
socio-economical parameter. For infrastructure networks 
such as power grids, it can be the voltage at the elec- 
tric substations. In general, this attribute depends on 
space and the resulting network displays entangled lay- 
ers of parameters. An important goal in the analysis of 
these networks is to disentangle these different levels and 
to extract some mesoscopic information from the spatial 
network structure. If one is interested in studying effects 
beyond space [5], one should have a straightforward way 
to 'subtract' it from the network, or in other words, to 
disentangle space and the other attributes. 

A natural tool for such a task is community detection 
which was used for the characterization at a mesoscopic 
scale of the properties of complex networks (see [11] for 
a review). A (real- world) community can be naturally 
defined as a group of network elements having the same 
attribute value such as language or age for social net- 



works, or the internet domain name for web pages. At a 
more quantitative level, a community can be thought as 
a set of nodes more densely linked with each other than 
with the rest of the network [12]. Community detection 
procedures consist in finding these groups of nodes in the 
network. Various methods were proposed so far and we 
refer the interested reader to the review [11]. In particu- 
lar, the Newman-Girvan method [13] which relies on the 
optimization of a quantity called modularity is frequently 
used and despite its intrinsic limits shown in [14], it pos- 
sesses the advantage of being simple and relatively easy 
to implement. 

Community detection can have several purposes in spa- 
tial networks [2, 4, 15, 16], but probably the main one 
is to disentangle these various aspects, including spatial 
correlations of any type. In most cases [2, 4] communi- 
ties are determined by the geography only, which results 
from the simple fact that the most important flows are 
among nodes in the same geographical regions. In this 
sense, community detection in spatial networks offers a 
visual representation of large exchange zones. This even 
suggests that community detection might be an impor- 
tant tool in geography and in the determination of new 
administrative or economical boundaries [8]. 

In the general case, for a given network we don't know 
to what extent the existence of a link between a pair of 
nodes is due to a specific factor or to space only. The 
link could exist because of a strong attribute affinity be- 
tween the nodes, or in the other extreme case, because 
they are close neighbors. In general, one could expect 
a combination of these two effects. If we are interested 
in recovering communities defined by an attribute (such 
as language for example) from the network structure, we 
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then have to consider various assumptions such as the 
correlation between hnk formation, attribute values and 
space. In order to understand the effect of the under- 
lying correlations, we can consider two extreme cases. 
When the links are purely spatial and independent from 
the attributes, if we remove the spatial component, we 
will observe random communities (obtained for a ran- 
dom graph) which contain a random number of nodes 
with random attributes. In this situation, community 
detection is unapplicable and there is no way to recover 
attribute communities from the network structure. The 
other extreme case is when the formation of a link de- 
pends on the attributes only. In this case, space is ir- 
relevant and any standard community detection method 
should give sensible results, ie. communities made of 
nodes with the same attribute. 

The important problem of interest here is thus the in- 
termediate case when the probability to have a link de- 
pends both on attributes and on space. In this case we 
have to eliminate spatial effects in order to recover the 
attribute structure. An important point in the discus- 
sion is then the existence of correlation between space 
and attributes. The nature and existence of these corre- 
lations will govern the way we will have to do community 
detection. In this paper, we construct a simple artificial 
network model allowing us to investigate the effect of 
these correlations on the results of the community detec- 
tion procedure. We will test various methods on this toy 
model. 



II. MATERIALS AND METHODS 

A. A BENCHMARK FOR SPATIAL 
NETWORKS WITH ATTRIBUTES 

In order to test these ideas and how community detec- 
tion acts on spatial networks, we define a simple model 
of spatial networks with attributes. The attributes could 
be anything and we will restrict - without loss of gen- 
erality - to the simple binary case where the attributes 
can have two possible values at each node. We will in- 
troduce a simple model where nodes and their attributes 
are randomly distributed in space. In general, according 
to the various parameters of the model, the attributes 
can be delocalized in space or, on the contrary, be lo- 
calized in some well-defined region. In some cases, some 
attribute community could emerge in space, but our tar- 
get community structure will always be the partition of 
the network in the two subgraphs composed of nodes with 
the same attribute and we will test how various methods 
can recover these two communities. In this respect the 
main focus of our work will be the disentanglement of the 
sole attribute network features beyond the spatial node 
arrangements. 

We construct the test (benchmark) network defining 
the vertex and edge properties in the following way. 

Vertex properties: 



1. We generate points/nodes in the 2d space {x — z) 
in two spatial communities, say the North and the 
South, around the two centers (x, z) = (0, +L) and 
{x,z) — (0, — L) (see Fig. 1). A simple way to do 
that is to generate points i around the two centers 
according to the probability 

p{x,,z,)oce~''"/' (1) 

where dd is the euclidean distance between one of 
the centers c and the node i of coordinates {xi, Zi). 




FIG. 1: The two spatial communities North and South are 
well separated having their average size £ = L. In the A panel 
we present the case e = where there is a perfect correlation 
between the space and the attributes (green and red colors). 
In the B panel, the uncorrelated case e — 0.5 is presented 
where the attribute colors are randomly distributed between 
the two segregated spatial communities (for the sake of clarity, 
only 40 out of the 100 nodes used in our simulations are shown 
here, and P — 1.0). 

2. We assign an attribute Si to each node i. In the fol- 
lowing we will focus on the simplest case where this 
attribute can take only two values Si — ±1 (which 
in this paper are the red and green colors). A sim- 
ple way to control correlations between attribute 
and space is to choose Si = +1 with probability 
q for z > and Si — ~l with probability 1 — q. 
In order to tune the various cases we introduce the 
parameter e, with g = 1 — e, that determines the 
mixing between space and attributes, ranging from 
0.0 to 0.5. In the case e = 0.0 space and attributes 
are strongly correlated, while for e = 0.5 space and 
attribute are totally uncorrelated. 

So the relevant parameters for the generation of 
network nodes are £ and e. 

Edge properties: 

3. We then construct the network: for each pair of 
nodes, we create a link between nodes i and j 
with probability piink{i,j) oc gPSiSj-dtj/io -virhere 
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£o plays the role of the typical size of the spatial 
community (and where dij is the euclidian distance 
between i and j). It is worth observing that the 
parameter is the typical length of links when 
space dominates while £ is the typical spatial size 
of the northern and southern communities. Here 
the relevant edge parameters are f3 and £g, but in 
order to simplify the model and to focus on the efh- 
ciency of community detection methods, we choose 
£ = £q. This choice implies that when space domi- 
nates the link formation, the links cannot be much 
larger than the community size. In this case, the 
only spatial relevant parameter will be £/ L and we 
can fix L to be equal to 1.0 so that the spatial vari- 
ability will be governed by i. We can rewrite the 
probability punk{ij) as 

PUn,i^,J)-^e^^'''^~'-/''^ (2) 

where J\f — J2i<j iP^i'^j ~dij/£) is the nor- 
malization constant. As in the Erdos-Renyi ran- 
dom graph, the number of edges is a random vari- 
able with small fluctuations around its average. 
The number of nodes is thus fixed in each network 
but not the number of edges or the average degree, 
and this implies that we will have to average our 
observables over different realizations of the net- 
work. 

When P£ is large, links are essentially between 
nodes with the same attribute (irrespective of their 
distance) and if /3£ is small then space is the govern- 
ing factor and links are essentially between neigh- 
boring nodes. 

In this way the probability associated to a link depends 
on both space and attribute, and the correlation between 
attributed and space can be controlled. If the attribute 
is the same between two nodes the probability to have 
a link will be reinforced, otherwise it will be weakened, 
the interplay being controlled by the parameter /3 . Con- 
cerning the spatial factor, the closer the nodes and the 
larger the probability associated to this link. 

The generation of attributes is an important point. We 
have two values of the attribute only so that we need 
to generate attributes for only half {N/2) of the nodes. 
So in the following we will study the specific case of an 
attribute community structure of equal size communities: 
half of the nodes has attribute Si = +1 and the other 
half has Si — —1. We will investigate here two extreme 
situations: 

• Attributes and space uncorrelated: this case is re- 
covered by choosing e = 1/2. 

• Attributes and space are strongly correlated. For 
this, we choose e small. In this case, the spatial 
communities are also attribute communities. 



Furthermore we can distinguish two different spatial 
arrangements for the northern and southern communi- 
ties. The first case corresponds to a situation where the 
two communities are well separated with their average 
size £ < L and the spatial effects dominate the commu- 
nity structure (see Fig. 1). The second situation corre- 
sponds to a larger value of the average community size 
£ where the two communities start mixing up while £ 
approaches L (see Fig. 2). 




FIG. 2: The two communities North and South arc mixing 
up each other with their average size £ approaching the value 
of L (in this case £ = 2L). In the A panel, we display the 
case e — 0.0. Even if the spatial correlation is fading away 
the space-attribute correlation is still strong enough to dis- 
play an attribute community. In the B panel, we show the 
extreme case e = 0.5 where the attributes are not correlated 
with space. In this case spatial mixing destroys the attribute 
community structure (for the sake of clarity, only 40 out of 
the 100 nodes used in our simulations are shown here, and 
/? = 1.0). 

There are many proposal in the literature for networks 
benchmarking (see for example [17]), but this is -up to 
our knowledge- the first one which takes into account the 
correlation between space and node attributes. 



B. TESTING VARIOUS METHODS 

The interplay between space and attributes can lead 
to various situations that need to be understood within 
the framework of community detection. Indeed we have 
two main regimes (3£^ 1 and /3£ <C 1 (see also Table I) : 



(a) /3£ ^ 1. In this case, the spatial component of the 
links becomes irrelevant (see Eq. 2) and for a given 
value of /3 the community structure due to the node 
attributes will emerge, independently from the corre- 
lation between space and attributes. In this regime 
any community detection method should work. 
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Spatial correlation e 


/3€ <C 1: Space is the governing 
factor 


^ 1: The spatial component of 
the links is irrelevant 


Spatially correlated: 
(e ^ 0.0) 


• Links are between neighbor- 
ing nodes but spatial commu- 
nities correspond to the at- 
tribute ones. 

• Any regular community de- 
tection will work. 


• Links are between nodes with 
the same attribute. 

• Any community detection 
method should work. 


Spatially uncorrelated: 
(e - 0.5) 


• Links are between neighbor- 
ing nodes but the attributes 
are anywhere in space. 

• It is necessary to 'remove' 
space in order to uncover the 
attribute communities. 


• Links are between nodes with 
the same attribute. 

• A nv cnimminitv nefpctinn 

method should work. 



TABLE I: The table gives an account of the behaviour of the model in the regimes j3£ <^ 1 and /3£ <^ 1 both in the correlated 
(e = 0.0) and uncorrelated (e = 0.5) case. 



(b) /3£ 1. Here we have two subcases depending con 
the correlation between space and attributes: 

• (e = 0.0) Space and attributes are correlated : 
any regular community detection will work and 
moreover if you carefully remove the spatial ef- 
fect the attribute community structure will be 
recovered. 

• (e = 0.5) Space and attributes are uncorrelated : 
in this case the links are between neighboring 
nodes but the attributes are anywhere in space. 
Standard community detection methods won't 
work and it is then necessary to 'remove' space 
in order to uncover the attribute communities. 

The general assumption of our model is to what ex- 
tent it is possible to detect communities even if there is 
a spatial influence. Without space the initial situation is 
clear: we have two communities by construction and the 
probability of two nodes to be connected is related to the 
attribute similarities. Nodes with S=+l tend mainly to 
connect to each other and the same for the S=-l nodes. 
If we then put nodes in space and enhance the connec- 
tion probability due to the proximity of nodes, it is not 
clear if a regular community detection method is able to 
detect the original two communities structure. We thus 
see that correlations between space and attributes can 
be misleading and any community detection method for 
spatial networks should take into account this problem. 
There are now many community detection methods [11] 
and in the following we will use modularity optimization 
introduced by Newman and Girvan [13]. This method 
suffers from various problems, the most important being 
the existence of a resolution limit [14] which prevent it 



to detect smaller modules, but it is simple enough to im- 
plement. In addition, our point here is to understand the 
effect of space-attributes correlations on community de- 
tection and not to compare various methods. In the fol- 
lowing we will thus essentially probe the Newman-Girvan 
method and variants proposed here and in [5] for cases 
where the space and attribute have different degrees of 
correlation. 

The modularity function which needs to be optimized 
is defined as [13]: 

where the sum is over all the node pairs, A is the adja- 
cency matrix, m is the total number of edges and Pij is 
the expected number of edges between the vertices i e j 
for a given null model. The S function will result in a 
null contribution for couples of vertices not belonging to 
the same community (Ci ^ Cj). For an unweighted net- 

k - k ■ 

work, one can choose Pij — which amounts to take 
as a null model a random network with the same degree 
sequence as the original network. In order to introduce 
explicitly space, the idea is to change the null model de- 
fined by Pij and to compare the actual network with this 
null model. Recently, such a proposal was made in [5] 
where the quantity Pij is directly obtained from the data 
describing the network. More precisely. Expert et al. [5] 
used the following form 

P^^ata ^ N,N,f{d,j) (4) 

where Ni is related to the importance of the node i (such 
as the population for example) . This form is reminiscent 
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of the gravitional model for traffic flows (see for example 
[18]) where flows are proportional to the product of pop- 
ulations and decrease with distance. In [5], the authors 
proposed to estimate the unknown function / directly 
from the empirical data by 



fid) 



(5) 



which can be seen as the probability to have two nodes 
connected at a distance d. Note that there is a binning 
procedure hidden in Eq. (5). The usual way to proceed in 
these cases consists in introducing a discretization of the 
space in bins that capture classes of distances. Following 
[5] , we performed a binning of distances selecting the best 
value for the number of bins after a detailed stability 
study of the distributions obtained from the data. 

Expert et al. [5] applied this method to the specific 
case of the phone network in Belgium, and try to re- 
construct linguistic communities (Flemish and French) 
beyond individuals spatial location. This choice is prob- 
ably the best one if there are no correlations between the 
attribute under study (in their case the linguistic mem- 
bership of the people calling each other) and space. In 
this specific case, extracting the node spatial dependen- 
cies from the actual link distribution present in the net- 
work data is the most effective way to subtract the spa- 
tial component. Otherwise if there are any correlations 
between space and node attributes, the data contain in 
an unknown proportion the two informations (space and 
attribute) and their method needs to be reformulated. 
One possible way to do this is to explicitly guess a spa- 
tial dependency of the link distribution and to put it as 
an independent factor in the optimization function def- 
inition. In order to be able to deal with the correlated 
case and to remove spatial effect only, we thus propose 
the following explicit function of space for Pij 



PI 



Spatial 



1 



kikjgi^dij) 



(6) 



where Z is the normalization constant, ki the degree of 
the node z, dij the euclidean distance between node i 
and node j. The function g{d) is a decreasing function 
of distance and its role is to remove the spatial effect. A 
simple choice is 



g{d) = e 



(7) 



where I is the average distance between nodes in the net- 
work. Of course Z is a rough approximation of the real £ 
value, but we will see in the following that it is enough 
to capture the essence of the spatial signature of the net- 
work. 

We now need a method to compare the community 
structure obtained with the modularity optimization and 
the expected one for the attribute membership. Many 
proposals have been introduced [19-21], and we decided 



to use here the Jaccard Index [22, 23]. This index is 
an extension of the Rand index [24], and is considered 
to be one of the most robust measure for the clustering 
and classification assessment of graphs [25] . If C is the 
partition to be evaluated and C" the reference one the 
definition is as follows 



Ji 



a + b + c 



(8) 



where a is the number of vertices pairs that are in the 
same community for both C and C", 6 is the number of 
pairs that are in different communities in C but in the 
same one in C" and finally c is the number of vertices 
pairs that are in the same community in C but not in C 
(or conversely). This quantity J/ is in the interval [0, 1] 
and the closer to one, the better the agreement between 
the two partitions. For J/ = 1 there is a perfect match 
between the two community structures. In our case, it 
would mean that the attribute communities are exactly 
detected. For values of J/ less than 1 the discrepancy 
can depend both on the size of the partitions in the com- 
munity structure and/or the number of them and in this 
respect the Jaccard Index is a good method to compare 
a very heterogeneous range of community structures. 

In order to get a more intuitive picture of the Jac- 
card index, we show three different cases in Fig. 3 for the 
same value f3£ = 0.2 (and in the case e = 0.0, £ = 1.0 and 
L = 1.0) but with different values of Jj. The first case 
corresponds to a relatively small value J/ — 0.232 (ob- 
tained with the 'Data' method of [5], where the binning 
is done as in their paper, which shows a partition in four 
communities (instead of the two associated with the at- 
tributes in red and green colors) . For intermediate values 
such as Ji = 0.579 (obtained with our 'Spatial' method) 
the communities reduce to three with a prevalence of cir- 
cles in the nothern part and triangles in the southern (see 
B panel in Fig. 3). The last case (obtained with the orig- 
inal Newman-Girvan formulation) corresponds to a value 
Jj — 0.903, that almost recovers the attribute commu- 
nity structure. 

Finally, in order to have a baseline value we also 
computed the average Jaccard for a completely random 
partition for N ~ 100 nodes and we obtain the value 
Ji = 0.08 ± 0.05. 



III. RESULTS 

The goal of this spatial community detection is to sub- 
stract the spatial component and to recover the (two) at- 
tribute communities. We thus have three community de- 
tection methods: the original Newman-Girvan method, 
the 'Data' method proposed in [5], and our 'Spatial' 
method defined by the null model of Eq. (6) and, in or- 
der to understand their limits , we will test them against 
the benchmark network introduced above. 

We will now see how these three different methods per- 
form in the two extreme cases of attribute correlated 
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FIG. 3: Three spatial network configurations are presented for 
tlie constant value (31 = 0.2 and the correlated case e = 0.0 
with i = l.Q and L = 1.0. The color (red and green) are the 
attributes, while the geometrical shapes represent the com- 
munity memberships found with the various community de- 
tection procedure discussed in this paper. In the A panel, we 
present the case Ji = 0.232, obtained with the Data method. 
Due to the low Ji value four communities are present (instead 
of the two associated with the attributes in red and green col- 
ors) and they are also mixed up between the south and the 
north spatial regions. In the B panel we show the Ji = 0.579 
case obtained with the Spatial method. Three communities 
are present and in the northern part there is a prevalence of 
circles while in the southern of triangles. The C panel dis- 
plays the case Jj = 0.903 obtained with the Newman-Girvan 
formulation and the attribute community structure is almost 
completely recovered. 



(e = 0) and uncorrelated (e — 0.5) with space, both 
varying the size of the spatial communities i and the at- 
tribute linkage strength (3. The size of the test network 
is N = 100 nodes and the number of links depends on 
the probability previously defined (Eq. 2). We generated 
100 network realizations for each set of parameters (/3, 
£, e and L = 1). For each point of the simulation curve 
the error bars are the standard deviation for 100 modu- 
larity measures. To optimize the modularity we used the 
Louvain method [26]. 

The behavior of the model depends on both parameters 



/? and £ and we will first show the case with fixed attribute 
strength /3. We show on the A panel of figure 4 the 
correlated case (e = 0) with a fixed (3 — 1.0. 
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FIG. 4: The community structure obtained for various values 
of £ with fixed /3 = 1.0. Each point represents the average 
Jaccard index for 100 network community detection and the 
error bar is its standard deviation. The correlated case e = 
is shown on the A panel, and on the B panel we show the 
uncorrelated case e = 0.5. In A for the regime I3£ <^ 1 both 
the Newman-Girvan and the 'Spatial' method formulations 
give the right attribute community structure corresponding 
to the Jaccard index Ji = 1.0. For the regime /3£ ^ 1 all 
the three formulations work well since the links due to the 
attribute similarity are strong enough to preserve the com- 
munity structure irrespectively from the node's location. In 
the uncorrelated case (B panel), the Data based formulation 
performs better respect to the Spatial formulation, since it 
extracts correctly the spatial information, directly from the 
data. In any case both spatial methods reach the right at- 
tribute community structure at almost the same value for 
£ ~ 1.0. The Newman-Girvan standard formulation instead 
fails to detect the correct result up to values of ^ ~ 1.8. Note 
that in the x-axis we considered only values equal or above 
0.3 since we verified that below this value the model generates 
disconnected networks. 

In this case, for /3£ ^ 1, all the three methods work 
well, as expected and we obtain a perfect match (J/ — 1) 
between the community structure resulting from the 
modularity optimization and the attribute communities. 
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Space is not relevant in this regime and links exist essen- 
tially among nodes with the same attribute. For <C 1 
both the Newman-Girvan modularity and the 'Spatial' 
method give the correct result. The latter actually sub- 
tract only the spatial dependency while the the 'Data' 
method mixes the space effect with the correlated at- 
tribute feature, resulting in a wrong community detec- 
tion. The 'Data' method, for a sufficiently large value of 
£ will approach anyway the correct J/ — 1.0 value. 

In the uncorrelated case (Fig. 4, B panel) and for a 
low values of /3£, the Newman-Girvan modularity is not 
able to detect the right attribute communities, since the 
attribute correlation is not strong enough to group to- 
gether the nodes of similar type. Instead the other two 
methods perform better in getting the attribute commu- 
nities since they are able to correctly eliminate the effect 
of space and recover the attribute community structure, 
even for a small attribute correlation. The formulation 
based on Data performs even better since it eliminates 
the effect of space almost pointwise, but in any case the 
correct result of J/ = 1 is reached almost at the same 
value £ ~ 1.0 for both spatial methods. 

In Figure 5 we show the results for the case of a fixed 
community size iji = 1.0) but where we vary the attribute 
strength /3. In the A panel the correlated case is pre- 
sented (e = 0). As expected the 'Data' method for low 
values of /3 has problems in detecting the attribute com- 
munity structure and only for high attribute strengths 
(/3) it starts to correctly detect the target communities. 
In the uncorrelated case, where the space is irrelevant, 
the standard Newman-Girvan formulation fails, while the 
two spatial methods performs similarly better (Fig. 5). 

In order to summarize these results we show in Table II 
the only relevant regime (b) previously defined, /3£ ^ 1 
(the (a) regime /3€ 3> 1 is trivial as we can verify in Figs 
4 and 5) for all the parameters of interest (e, £ and /3) 
and for the three community detection methods. From 
this Table, it clearly emerges that the Spatial method is 
a very good interplay in all situations, while to get the 
best performances one has to choose the suitable method 
for any specific case. 
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FIG. 5: The community structure obtained for various values 
of /3 with fixed community size I = 1.0. Each point repre- 
sents the average Jaccard index for 100 networlc community 
detection and the error bar is its standard deviation. The 
correlated case e = is shown on the A panel, and on the B 
panel we show the uncorrelated case e = 0.5. In the uncorre- 
lated case the 'Data' method fails in detecting the attribute 
community structure for all the fil regimes present in the fig- 
ure, while the other two methods start working at /3 = 0.8. In 
the uncorrelated case the Newman-Girvan method is not able 
to detect the attribute community structure, while the spa- 
tial methods perform similarly better approaching the correct 
Ji = 1.0 value around /3 = 0.8. 



We note that the behavior of the error bar sizes in 
these figures 4, is interesting. For /3£ <C 1 and f3£ ^ 1, 
the error in the modularity estimate is relatively small. 
The error bar -or equivalently the fluctuations of the Jac- 
card index- are the largest for /?£ ~ 1. In this region, the 
community detection methods are thus more sensitive to 
small fluctuations of the network which implies a peak in 
the 'susceptibility' of the system. This behavior is remi- 
niscent of the phase transition between detectability and 
non-detectability presented in [27, 28]. Indeed, in figure 
6 we show the limiting case oi I ^ L (here we choose nu- 
merically I = 4 and L = 1) for which the effect of space 
is irrelevant. In this limit, our model becomes equiv- 
alent to the stochastic block model of [28] with q = 2 
possible values of the attribute. In our case the control 



parameter (cout/cm in [28]) is exp(— 2/3), while the or- 
der parameter is the Jaccard index. It is clear from Fig. 
6 that the same effect is present (see figure 2 in [28]) 
even if the critical point is shifted due to a different com- 
munity detection method and another definition of the 
order parameter. Moreover, respect to the result in [28], 
in the undetactable regime (/3 = 0), the value of the order 
parameter is not zero. As mentioned above, for a com- 
pletely random partition the J/ is J/ = 0.08 ± 0.05. We 
observe that in our case we are a little bit above because 
it is known that even for a random network the modular- 
ity can be positive [29] and in this way the maximization 
of the modularity extracts a subset of the ensemble of all 
the possible partitions that increases the average modu- 
larity and consequently the average Jaccard index. 
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Spatial correlation e 
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0.0 (correlated) 
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VG 
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0.5 (uncorrelated) 


I 
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VG 


G 


P 


B 


G 
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TABLE II: The table summarizes the performances, as can be 
extracted from Figs 4 and 5, of the three methods (Newman- 
Girvan, Data and Spatial) in the only non trivial regime 
l3£ <^ 1, both in the correlated (e = 0.0) and uncorrelated 
(e = 0.5) case. Since in the plots we vary both £ and j3, we 
distinguish here these two cases. In order to be able to com- 
pare this results we classified them according to the following 
criteria: B, G and VG that stand for Bad, Good and Very 
Good. We assign VG when there is a very good agreement 
with the target attribute community structure (J/ very close 
to 1), G when the behavior is rapidly approaching the correct 
result even for low/medium values of the parameters £ and 
/3, and finally B when it completely fails to recover the right 
community structure. 



tial method performs better while above that point the 
Data method does slightly better. This result thus shows 
that there can be a non-negligible range of correlations 
(measured here by e) for which the spatial community 
detection results can be incorrect. 



« H « Ts 



* Data 
X Spatial 




Data 
Spatial 

Newman-Girvan 



0.4 0.6 
exp(-2P) 



FIG. 6: Transition obtained in the case £ ^ L from the 
detectable to the undetectable community structure regions. 
This transition was described in [28] for the stochastic block 
model which corresponds to our model with q = 2 attributes 
when the effect of space is absent, i.e. £ large {£ = 4.0 in the 
actual simulation). The control parameter is then exp(— 2/3) 
and the Jaccard index is our order parameter. All the three 
community detection methods discussed in this paper display 
the same behavior adding evidence to the universality of the 
transition presented in [28]. 



We thus recover the results of [28] and in addition our 
result seems to point to the existence of a spatial phase 
transition actually independent of the community detec- 
tion method used. 

Finally, we checked the performances of the Data and 
Spatial formulations looking at the Jj values when vary- 
ing the e parameter for a fixed f3£ value (see Fig. 7). For 
each value of e an higher J/ value signals a better behav- 
ior since it is closer to the maximum value Jj — 1. We 
choose first the value f3£ = 0.8 (we also tested f3£ — 1.0 
which gives similar results). There is a crossover in the 
performances around e ~ 0.25. Below this value, the Spa- 



FIG. 7: Performances of the Spatial and Data modularity 
formulations. We show here the case l3£ = 0.8 where there is 
a crossover in the performances around e ~ 0.25. Below this 
value e = 0.25 the Spatial method performs better and above 
the Data method is slightly better. 



IV. DISCUSSION 

In this paper we propose a simple model which allows 
us to test community detection on spatial networks. Our 
model generates simple graphs that mix both geograph- 
ical properties and attributes. In the literature many 
other spatial network models have been introduced for 
which nodes are connected each other through a certain 
spatial rule. Examples range from the growth of street 
networks to the evolution of the territorial infrastruc- 
tural networks (see [1] for an extensive list of this kind 
of models) . Moreover a whole class of models that study 
node properties and their aggregation has recently been 
introduced and one of the most important of them is the 
stochastic block model in which a combination of vari- 
ous kind of node attributes are present. The novelty of 
our approach is to study at the same time these vari- 
ous aspects (geography and attributes), and, up to our 
knowledge, our model is the first one that considers si- 
multaneously the two factors, space and attributes, in 
the context of community detection. 

In particular, we explicitly show that the existence of 
correlations between attributes and space drastically af- 
fects the result of community detection. The results pre- 
sented in this study show that community detection in 
spatial networks should be taken with great care, and 
that including space in community detection methods 
could lead to results difficult to interpret. We show that 
for weak correlations, most community detection meth- 
ods work, but that for stronger correlation community 
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detection methods which remove the spatial component 
of the network can lead to incorrect results. It is thus im- 
portant to have some information on the correlations be- 
tween space and attributes in order to assess the validity 
of the results of community detection methods. In prac- 
tical applications however, these attributes- space corre- 
lations are generally not known and this calls for the 
need of new approaches, for example such as community 
detection methods including in some tunable form the 
existence of such correlations. 
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